254
17
Genomics
An alternative to the above is to create a water-in-oil emulsion from library DNA,
PCR reagents, beads to which the DNA can attach, and oil. Each aqueous globule
should contain one bead with one strand of DNA; because of the random nature of
the mixing that creates the emulsion, only 10–20% of the globules (“microwells”)
fulfil this criterion. Using the usual PCR procedure, the DNA fragments are multiply
copied to create the desired clusters of identical strands. These beads can then be
arranged in an array.
The key to the parallelization is array-based sequencing of the fragments. Early
NGS used pyrosequencing, but this has been superseded by other methods. Ion torrent
sequencing takes place by synthesizing a new, complementary DNA strand one base
at a time; each time a new base is added, a hydrogen ion is released and detected by
a semiconductor pH sensor. Inaccuracy can arise when a sequence of the same base
occurs: depending on the sequence length, it may be uncertain by at least one base. A
more accurate method is “sequencing by ligation” (SOLiD). A primer ofupper NN bases is
hybridized to the adapter, and the DNA is then exposed to a collection of octamers,
each of which has one of four fluorescent dyes at the 5’ end and a hydroxyl group at
the 3’ end. Bases 1 and 2 are complementary to the nucleotides to be sequenced, bases
3–5 are immaterial, and 6–8 are in the inosine bases; phosphorothioate links bases
5 and 6. DNA ligase then joins the octamer to the primer, and the fluorescent dye
is then cleaved using silver ions, generating a 5’-phosphate group that can undergo
further ligation. The dye (corresponding to one of the four bases) is identified, the
extension product is melted off, and a second round of sequencing is undertaken
with a primer ofupper N minus 1N −1 bases. Although accurate, this method is limited to short read
lengths. Reversible terminator sequencing (Illumina) has two varieties, 3’-O-blocked
and 3’-unblocked. In the first, the target DNA fixed to a solid support is exposed to
the four bases, each with a different fluorophore attached. After binding, the base is
ligated to the primer, unincorporated nucleotides are washed away, and the support
is imaged to identify the base. The fluorophore is cleaved to regenerate the 3’-OH
termination and the cycle is then repeated. In the second variety only one fluorophore
is used and the target DNA is exposed to each base in sequence.
Third generation sequencing uses single molecules, hence avoiding errors intro-
duced by the PCR and, very importantly, allows much longer length of DNA to be
“read”. The technology continues to evolve increasingly rapidly and fourth genera-
tion methods are emerging. Progress is now being hindered by the enormous amounts
of data being generated by the sequencing technologies. For clinical applications,
accuracy and throughput can be enhanced by constraining sequencing to limited
areas of the genome. If a reference genome is available (as it is in the human case),
the sequence fragments can be mapped onto it, greatly improving the speed and reli-
ability of assembling the complete sequence. For most clinical work, variation from
a canonical sequence is of the greatest interest.